Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions
نویسندگان
چکیده
We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. Our results have implications for the design of next generation fault-tolerant distributed and cloud storage systems.
منابع مشابه
An approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملDesign of a Secure and Fault Tolerant Environment for Distributed Storage
We discuss the design and evaluation of a secure and fault tolerant storage infrastructure for un-trusted distributed computing environments. Previous designs of storage systems for this space have tended to use decoupled mechanisms for achieving fault tolerance and security. Our design, based on cryptographic properties of error-correction odes, combines redundancy (for fault tolerance) and en...
متن کاملExploiting Soft Computing for Increased Fault Tolerance
Traditionally, fault tolerance researchers have made very strict assumptions about program correctness. Such strict notions of correctness are appropriate for workloads that are numerically oriented. However, a growing number of important workloads produce results that have a higher (often qualitative) user-level interpretation. We call these soft computations. Examples of soft computations inc...
متن کاملCSAR-2: A Case Study of Parallel File System Dependability Analysis
Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage effici...
متن کاملProposing an Efficient Software-based Method to Enhance Reliability of Computer Systems against Soft Errors
In recent years, along with rapid developments in technology, computer systems haveincreasingly become more integrated and more modular. Indeed, the reliability and efficiency ofcomputer systems are of high significance. Hence, the quantitative evaluation of the optimizationof reliability indexes in computer systems is considered to be a crucial issue. Reliabilityenhancement of computer systems...
متن کامل